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Abstract 

We present a sublinear randomized algorithm to compute a sparse Fourier transform for 
nonequispaced data. Suppose a signal S is known to consist of N equispaced samples, of 
which only L < N are available. If the ratio p = L/N is not close to 1, the available data 
are typically non-equispaced samples. Then our algorithm reconstructs a near-optimal i?-term 
representation R with high probability 1—5, in time and space poly (B,\og(L), log p,log(l/ 6), 
e _1 ), such that \\S — R\\ 2 < (1 + e)\\S — R^ pt \\ 2 , where R^ pt is the optimal £>-term Fourier 
representation of signal S. The sublinear poly(logL) time is compared to the superlinear 
0(N log N + L) time requirement of the present best known Inverse Nonequispaced Fast 
Fourier Transform (INFFT) algorithms. Numerical experiments support the advantage in speed 
of our algorithm over other methods for sparse signals: it already outperforms INFFT for large 
but realistic size N and works well even in the situation of a large percentage of missing data 
and in the presence of noise. 

1 Introduction 

We consider the problem in which the recovery of a discrete time signal S of length N is sought 
when only L signal values are known. In general, this is of course an insoluble problem; we 
consider it here under the additional assumption that the signal has a sparse Fourier transform. Let 
us fix the notations: the signal is denoted by S = (S(t))t=o,...,N-i, but we have at our disposal only 
the (S(i))i e T, where the set T is a subset of {0, . . . , N — 1} and |T| = L. The Fourier transform 
of signal S is S = (5(0), . . . , S(N - 1)), denned by S(u) = 4- J]^ 1 S(t)e~ 2niujt/N . In terms 

of the Fourier basis functions = -±= e 2muJt / N , S can be written as S = J2^=o 5(^)0 a; (t); 

this is the (discrete) Fourier representation of S. A signal S is said to have a 5-sparse Fourier 
representation, if there exists a subset C {0, . . . , N — 1} with |0| = B, and values c(u>) ^ 
for uj E T, such that S(t) = XLen c ( UJ ) ( f } ^- F° r a signal that does not have a £>-sparse Fourier 
representation, we denote by R^ pt (S) the optimal £>-term Sparse Fourier representation of S. 
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This paper presents a sublinear algorithm to recover a B- sparse Fourier representation of a 
signal S from incomplete data. Our algorithm also extends to the case where the Fourier transform 
S is not -B-sparse, where we aim to find a near-optimal £>-term Fourier representation, i.e. R = 
XLer c(w)0 w , such that 

\\S - R\\ = \\S - £ <<»)4>4l < (1 + e)\\S ~ RU S )Wl d) 

A typical situation where our study applies is the observation of non-equispaced data, where the 
samples are nevertheless all elements of rZ for some r > 0. For a signal with evenly spaced data, 
the famous Fast Fourier Transform (FFT) computes all the Fourier coefficients in time 0(N log N). 
However, the requirement of equally distributed data by FFT raises challenges for many important 
applications. For instance, because of the occurrence of instrumental drop-outs, the data may 
be available only on a set of non-consecutive integers. Another example occurs in astronomy, 
where the observers cannot completely control the availability of observational data: a telescope 
can only see the universe on nights when skies are not cloudy. In fact, computing the Fourier 
representation from irregularly spaced data has wide applications [ 19 1 in processing astrophysical 
and seismic data, the spectral method on adaptive grids, the tracking of Lagrangian particles, and 
the implementation of semi-Lagrangian methods. 

In many of these applications, a few large Fourier coefficients already capture the major time- 
invariant wave-like information of the signal, and we can thus ignore very small Fourier coeffi- 
cients. To find a small set of the largest Fourier coefficients and hence a (near) optimal £>-sparse 
Fourier representation of a signal that describes most of the signal characteristics is a fundamental 
task in applied Fourier Analysis. 

An equivalent version of this problem is as follows: define the matrix A := (e 2mkt: >)k=o,...,N; 
j=o...,L-i, where the tj are the locations of the available samples. Given S(tj), we want to recon- 
struct the signal S, or equivalently, its Fourier coefficients §k, so that AS = S. This linear system 
is over-determined. Several algorithms KllfTTl llT2l have provided efficient approaches to solve 
this problem. Among all INFFT algorithms, the iterative CGNE approach of |6| in the benchmark 
software NFFT 2.0 is one of the fastest methods; it takes time 0(L 1+ ( d ~ 1 )/' 8 log L), where L is the 
number of available points, d is the number of dimensions, and f3 > 1 is the smoothness for the 
original signal. The super-linearity relationship between the running time and iV (recall L = pN, 
where p is the percentage of available data) poses difficulties in processing large dimensional sig- 
nals, which have nothing to do with the unequal spacing. It follows that identifying a sparse number 
of significant modes and amplitudes is expensive for even fairly modest N. Our goal in this paper 
is to discuss much faster (sublinear) algorithms that can identify the sparse representation or ap- 
proximation with coefficients ai, . . . ,a B and modes ui, . . . , uj b for unevenly spaced data. These 
algorithms will not use all the samples S(Q), . . . , S(N — 1), but only a very sparse subset of them. 

Our approach is based on the paper [ 8 1 that shows how to construct the Fourier representation 
for a signal S with £> -sparse Fourier representation in time and space poly(B, log N, 1/e, log(l/5)) 
on equal spacing data. The algorithm contains some random elements (which do not depend on 
the signal); their approach guarantees that the error of estimation is of order e||S|| 2 with proba- 
bility exceeding 1 — 5. The ideas in [8] have also been applied by its authors to sparse wavelet, 
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wavelet packet representation, and histograms [7]. We have dubbed the whole family of algorithms 
RAiSTA (for Randomized Algorithm for Sparse Transform Approximation); when dealing only 
with Fourier Transforms, as is the case here, we specialize it to RA^SFA (F for Fourier). Zou, 
Gilbert, Strauss and Daubechies [ 20 1 improved and implemented the algorithm greatly. It convinc- 
ingly beats FFT when the number of grid points N is reasonably large. The crossover point lies at 
iV ~ 25, 000 in one dimension, and at iV ~ 460 for data on a iV x iV grid in two dimensions for a 
two-mode signal. When B = 13, RA£SFA surpasses FFT at N > 300, 000 for one dimensional 
signals and 1100 for two dimensional signals. 

In this paper, we modify RA£SFA to solve the irregularly spaced data problem. The new 
NERAfSFA (Nonequispaced RAfSFA) uses sublinear time and space poly(B, logL, e, log(l/<5), 
log p) to find a near-optimal 5-term Fourier representation, such that \\S— R\\ 2 < (1+e) \\S— R op t\\ 2 
with high probability 1 — 5. Similar to the RA^SFA algorithm, it outperforms existing INFFT 
algorithms in processing sparse signals of large size. 

Notation and Terminology Denote by xt a signal that equals 1 on a set T and zero elsewhere 
in the time domain. We say a signal H is q percent pure, if there exists a frequency cu and a signal 
p, such that H = ae 2mu)t l N + p, with \a\ 2 > || 2 . To quantify the unevenness of the data, 

introduce a parameter p = L/N to be the percentage of the available data over all the data, where 
L is the number of available data. Obviously a larger p corresponds to more information about the 
signal. We use L 2 -norm throughout the paper, which is denoted by || . || . The convolution F * G is 
defined as F * G(t) = J2 S F(s)G(t - s). It follows that fTg(uj) = \fNF{u)G{uu). 

A Box-car filter with width 2 k + 1 is defined as follows: 



A dilation operation on signal H with a dilation factor a is defined as H^(t) = H(at) for 
every points t. 

Organization The paper is organized as follows. In Section 2, we give the outline of the 
RAfSFA algorithm. Section 3 presents the modification of RA^SFA that deals with the unavail- 
ability of some samples by a greedy method. In Section 4, an interpolation technique is introduced 
for better performance. Finally, we compare numerical results with existing algorithms in Section 
5. 



Given a signal S of length N, the optimal 5-term Fourier representation RfL.(S) uses only B 
frequencies; it is simply a truncated version of the Fourier representation of S, retaining only the 
B largest coefficients. The following theorem is the main result of [ 8 1 . 




if -k<t<k, 
if t > k or t < —k 



In the frequency domain, this filter is in the form of 




(2) 



2 Set-up of RA^SFA 
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Theorem 2.1. Let an accuracy factor e, a failure probability 5, and a sparsity target B 6 N, B -C 
iV Z?e given. Then for an arbitrary signal S of length N, RAiSFA will find a B-term approximation 
R to S, at a cost in time and space of order poly(B, log(iV), 1/e, log(l/<5)) and with probability 
exceeding 1 - 5, so that \\S - R\\ 2 < (1 + e)\\S - Ro pt (S)\\ 2 2 . 

The striking fact is that RA£SFA can build a near-optimal representation R in sublinear time 
polyilogN) instead of the 0(iV log N) time requirement of other algorithms. Its speed surpasses 
FFT as long as the length of a signal is sufficiently large. If a signal is composed of only B modes, 
RA£SFA constructs S without any error. 

The main procedure is a Greedy Pursuit with the following steps: 

Algorithm 2.2. Total Scheme EOI 

1. Initialize the representation signal R to 0. Set the maximum number of iterations ITER = 
B\og(N)\og(l/5)/e 2 . 

2. Test whether \\S — R\\ appears to be less than some user threshold, i. If yes, return the 
representation signal R and the whole algorithm ends; else go to step 3.. 

3. Locate Fourier Modes uj for the signal S — Rby isolation and group test procedures. 

4. Estimate Fourier Coefficients at uj: (S — R)(uj). 

5. Update the representation signal R <— R+ (S — i?)(a;)0 u; (t). 

6. If the total number of iterations is less than ITER, go to 2; else return the representation 
R. 

The basic idea of Algorithm 12.21 is to identify significant frequencies and then estimate their 
corresponding coefficients. In order to locate those nonzero frequencies, we first construct a new 
signal where a previous significant frequency becomes predominant. Then a recursive approach 
called group test finds the exact label of this predominant mode, by splitting intervals, comparing 
energies, and keeping only intervals with large energies. After the frequency is located, coefficient 
estimation procedures give a good estimation by taking means and medians of random samples. 



3 NERA^SFA with Greedy Technique 

RA£SFA samples from a signal, implicitly assuming that uniform and random sampling is pos- 
sible, with a fixed cost per sample. This raises challenges for processing unevenly spaced data. 
Specifically speaking, Fourier coefficients and norms can not be estimated properly. Thus one has 
to modify steps 3 and 4 accordingly. In this section, NERA£SFA, a modified version of RA£SFA 
with greedy technique, is introduced to overcome these problems. 

The basic idea is a greedy pursuit for an available data point. Whenever the algorithm samples 
at a missing data point, it searches some other random indices t until it finds one available data 
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point S(t) as the substitute. This technique is used in estimating both Fourier coefficients and 
norms. 

A good data structure is important to save running time cost. We denote the availability of 
a data point by a label, say +1 for available and for unavailable. Hence, the label is tested to 
see if its corresponding sample is valid. An alternative solution is to store all the sorted labels 
of available data in a long list. However, each search takes time 0(log(iV)), which introduces a 
0(log N) 2 factor into the whole computation. As the empirical results show, the running time of 
NERA£SFA algorithm is linear to log N. For this reason, we selected the first method. 

We now give a more detailed discussion of the different procedures used in steps 3 and 4 of 
Algorithmic 

3.1 Estimating Fourier Coefficients 

First, we give the procedure for estimating Fourier coefficients for unevenly spaced data as follows. 

Algorithm 3.1. Estimating Individual Fourier Coefficients 

Input a signal S, a frequency u>, n = 21og(l/<5), m = 8/e 2 . 

1. For i = 1, . . . , n 

2. For j = 1 , . . . , m 

Randomly generate the index t until S(t) is available. 
Then letUj = t. Evaluate k{Uj) =< S(tij),<f>,j(tij) >■ 

3. Take the means ofm samples kfaj), i.e. p(i) = Y^jLi k(Uj)> where i = 1, . . . , n. 

4. Take the median of n samples c = mediani(p(i)), where i — 1, . . . , n. 

5. Return c as the estimation of the Fourier coefficient S(uj). 

Next, we show that using unevenly spaced data leads to a very good approximation to the 
true coefficient. The first lemma is one of most fundamental theorems in randomized algorithms. 
It essentially states that by repeating an experiment enough times, a small probability event will 
happen eventually. 

Lemma 3.2. If an event happens with probability p, then in the first k > log 5/ log(l— p) iterations, 
it happens at least once with success probability 1 — 5. 

In our case, only p = L/N percentage of the data is available, so that k > log 6/ log(l — L/N) 
trials are needed to generate one available data point with success probability at least 1 — 5. 

In fact, most of the Fourier coefficients of a characteristic function on a typical set T are small, 
under some conditions. The following lemma makes this more explicit. 
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Lemma 3.3. Suppose the components Xj of a discrete random variable X = (Xj) J=0 are iden- 
tically and independently distributed in {0, 1}, with p = Prob(Xj = 1). Define the random set 
T = {j G {0, . . . , N — l}\Xj = 1} to be the set of all available data; Xt(w) is the Fourier 

transform of xr(t) = T!^=q X r l fV > 1+{N U)x^ ' then 

Prob{\ XT {uo)\ 2 >\)<T 2 . (3) 



Proof First, we claim that E( \ xt (w) | 2 ) < ^riy- 
Since x t (uj) = ± E je T(e 2 ™ j/7V ), we have 



y j,ker 

= 1 V 1 4- 1 V P ^Mj'k)/N 

It follows that 

£r|y r MI 2 ) = — + -^—y pN - 1 V e***»U-W N 
u{\Xt{uj)\ ) + p 2 N 2 N — I ^ 

^ ^ j,k=0,j^k 

Observe that e^C*-*)/* = | g^/jvp _ ^-i j = (^^2 _ ^ hence 

= ± + - ^ = p ^ {l + ^(^,o - 1)} 

{7V-l + (p7V-l)(iV^ i0 -l)}. 



P N(N - 1) 
By Markov's Inequality, when w ^ 0, we have 



Prob(\xT^)\ 2 > A) < 



» , w E(\xtH\ 2 ) 1-P 



A p(JV-l)A' 
Since p > 1+ ( jV l 1 - )Ar 2 » it follows that 

Pro6(|xT(^)| 2 > A) < r 2 . 
That is , for any uj ^ 0, with probability at least 1 — t 2 

|xtH|<VA. (5) 

□ 
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In particular, we want both A and r to be small, meaning that p cannot be too small itself. 
Next, we consider the conditions for the two coefficients S(u) and Si(u) = S ■ Xt(^>) to be 
close. 

Lemma 3.4. Suppose the parameters T, S, Xrif), \ T , V are as stated in Lemma [Ol and define 
Si(t) = S(t)x T (t)- Ifp > i +(jv \) At 2 , andr < \Jl - (1 -8)%, then, for any w, 



\s{u)-sm\ < y/B\\\sh- 

with probability exceeding 1 — 5. 

Proof. Suppose the significant terms of signal S are u>i, where i = 1, . . . , B. 
Since S x {t) = S(t)xT(t) and thus S^u) = S(u) * Xt{u), then 



(6) 



B 

1=1 



Therefore 



B 
B 



(7) 



< 



B 



v E \XT^-^)\ 2 <\\Sh, 



B 



XT{Uj - UJi) 



Because p > 1+ ( jV l 1 ) Ar 2 . we have |x r (u;)| 2 < A with probability at least 1 — t 2 for any w ^ 0. 

This implies that \S x (uj) - S(uj)\ < \\S\\ 2 VBX with probability at least (1 - t 2 ) b > (1-5) 
Then 

|£ a (^) - <VBX\\S\\ 2 . (8) 

For those cu {tUi, i — 1, . . . , B}, 

B 

= ^2S(uj)x t (lu - 



i=i 



(9) 



and we conclude similarly that \S x (u>) — S(u) \ < VB\\\S\\ 2 ., with probability at least 1 — 5. □ 

We shall use Algorithm B.ll to estimate S x (u); we now look at how close the approximation A 
(i.e. the output of Algorithm 13 .11) of S x (oj) is to the true coefficient S(u). 
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Lemma 3.5. For a set of parameters T, S, Xrif), X, t, p as stated in Lemma \3.3\ ifp > i^mzm^ps j 
and r < a/1 — (1 — Sy/ B , then Alsorithm \3~T\ for signal S\(t) = S(t)xT(t) gives a good estima- 
tion A of S(u), such that 

\A- S(u)\ < Vb\)\\S\\ 2 . (10) 

with high probability. 

Proof. Lemma 4.2 in EDI says that the coefficient estimation algorithm returns A, such that 

lA-s^iKv^wsy. (ii) 

By Lemma EH 

\St(u)-S(uj)\<VBX\\S\\ 2 . (12) 

Thus ^ 

\A-S(u)\ < \A-S!{u)\ + \S!{u)-S(u)\ < (V\ + y/BX)\\S\\ 2 . (13) 



□ 



Finally, we derive the conclusion about estimating coefficients. 



Theorem 3.6. For a set of parameters T, S, Xrif), X, r, p as stated in Lemma \3~3\ if A < 2 {b+i) 
and p > i^jv-iUr^ ' every application of Algorithm \3. 1 \ produces. for each frequency to and 
each signal S, and each A > 0, with high probability, an output A (after inputting (S, ou, e) ), such 
that\A- S(lu)\ 2 < e\\S\\l 

\A-S(u)\<(y/\ + y/BX)\\S\\ 2 . (14) 



Proof. By Lemma 133 
Thus we have 



\A-S(uj)\ 2 <2(X + BX)\\S\\ 2 2 . (15) 
From the conditions 2 (A + BX) < e, it follows that 

\A-S(u;)\ 2 <e\\S\\l. (16) 

□ 

When we are able to get most of the data, the computational cost for estimating Fourier coef- 
ficients on unevenly spaced data is only slightly more than for the evenly spaced data case. The 
time to compute the signal value remains almost the same as for the evenly spaced data case. The 
extra time, in the worst case Q( g^^fz^y )^ comes from visiting unavailable data. Fortunately, the 
visit operation is very fast and therefore contributes little to the total time, especially when most of 
the data are available. 

Moreover, as in [20|, one can speed up the algorithm by using multi-step coarse-to-fine coeffi- 
cient estimation procedures, which turns out to be more efficient than single-step accurate estima- 
tion; the proof is entirely analogous to Lemma 4.3 in EDI . 



8 



3.2 Estimating Norms 



The basic idea for locating the label of a significant frequency is to compare the energies (i.e. the 
L 2 norm) of signals restricted in different frequency intervals. If the energy of some interval is 
relatively large, the significant mode is in that region with higher probability. We construct the 
following new signals to focus on certain intervals 

Hj(t) = xi(t)e^ * ^^((rtjeT * S (17) 

where 2qi + 1 is the filter width, j — 0, . . . , 15, a and 6 are random dilation and modulation factors. 
(Please see [20| for an explanation of the role of a and 6). For convenience, we denote Hj(t) by 
H(t). 

We need to evaluate values H(t) for random indices t E {0, . . . , N — 1}. Note that the signal 
H results from the convolutions of two finite bandwidth Box-car filters with the original signal S. 
Therefore, any missing point needed by the two convolutions would lead to a failure of computing 
F(t). The total number of signal points involved depends on the number of nonzero taps in these 
two filters. Moreover, random dilation and modulation factors of the second Box-car filter make 
computation more tricky. 

One naive way is to dive into the two convolutions and sample each signal point. If it is not 
available, stop evaluating this F(t) and start with a new index t. This definitely increases time 
cost by wasting abundant computation. For example, suppose five data are needed and only one 
of them is missing, then the algorithm may compute four data in vain in the worst case, where the 
missing data point is visited last in the sequence of 5. 

To avoid the above situation, we first compute the locations of all the points that will be needed 
for the convolution; only if they are all available will we start the computation. The locations 
related to the convolution are given in the following lemma. 

Lemma 3.7. Suppose we have a signal H(t) = (xi * (Xgi^ * S)^)^)^), where a ly a 2 , cr 3 , 
and (T4 are dilation factors. From the definition of Box car filter, the taps for xi lies in the interval 
[— 1, 1], the taps for x qi in [— qi,qi\, then in order to evaluate H(t), we need values of S with 
indices at oscr^t — a 3 aii — ja 2 , where integers i = — 1, . . . , 1, j = —qx, . . . , q\. 

Proof To evaluate H(t), first let signal r = * S) {a3 \ then 

i 

H(t) = ( X ( r ] * r)<«>(t) = XiMK^t - M (18) 

i=-l 

r(a 4 t - <ni) = (xj 2) * S)^\a A t - a x i) = ( X £ 2) * S)(a 3 a A t - a^i) 

Ql 

= 5^ Xqi(^2j)S((7 3 a 4 t - a 3 o x % - a 2 j). (19) 
j=-n 

Thus, in order to get the value of H(t), we need values of all S(t ), where t' = (y 3 o^t — o 3 (j\i — 023, 
with % = —1, . . . , 1 and j = —qi, . . . ,qi. □ 
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The scheme of the norm estimation algorithm is as follows. 
Algorithm 3.8. Norm Estimation 

Input: signal H, k = 0, the number of iterations M = 1.2 \n(l/5). 
While k < M: 

1. Randomly generate the index 

2. Compute all indices needed by the two convolutions: T = {t' , t = o^o^t — o^a\% — a 2 j}, 
where i = —1, . . . , 1 and j = —qi, . . . , q\. 

3. If all the points t' £ T are available, then compute H{tk) else go to step I and generate 
another index tk- 

4. estimate = 60-th percentile of the sequence {\H(tk)\ 2 N}, where k = 0, . . . , M — 1. 

If there exist satisfactory data groups, although maybe very few, the norm estimation will 
eventually find them. However, when most data are unavailable, the program may struggle in a 
long loop and take a huge amount of time. We introduce some tricks to avoid this. For example, 
set an upper bound MAX on the number of the loops. If it is reached, just use the sample points 
generated so far to estimate the norms. This technique may lead to a larger error, and thus hamper 
our frequency identification. However, by repeating the calculation, as stipulated by Lemma 3.2, 
we reduce the inaccuracy. Anyway we cannot hope to recover the signal, if p is too small. 

The following lemma investigates the number of repetitions to get a satisfactory data group for 
estimating norms. 

Lemma 3.9. Suppose x qi and \q 2 are tw° Box-car filters with numbers of taps 2g x + 1 and 2q 2 + 1 
respectively. Define D qi q2 = x qi * Xqi- Then D qi q2 has 2q 1 + 2q 2 + 1 nonzero taps in the time 
domain. 

Lemma 3.10. Randomly choose an index for signal H(t), then after k > log<5/log(l — (1 — 
p) 2qi+2q2+1 ) iterations, we can get at least one satisfactory index with high probability 1—5. 

Proof. It is easy to prove by Lemma l3~2l □ 

Here is a new scheme for estimating norms, which uses much fewer samples than the original 
one and still achieves good estimation. In ||2*U|| . we propose a lemma that enabled us to achieve a 
good norm estimation by only a few samples. The following lemma is its adaption to the case of 
unevenly spaced data. 

Lemma 3.11. If a signal H is 95% pure and ifr > 1.2 ln(l/5), the output of Algorithm \3. 8\ sives 
an estimation of its energy which exceeds ||if|| 2 /3 with probability exceeding 1 — 5. 
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Proof. The proof is very similar to that of Lemma 4.5 in [ 20 1 . We shall present only the difference 
of these two proofs. Suppose we sample r times for the signal H. Let k = {t : N\H(t)\ 2 < 
||if|| 2 /3}, with k c as its complement, we have 



ten 



1 1 



< \k\Y \H(t)\ 2 < M 2 --||# 



ten 



(20) 



On the other hand, we know that the signal is 95% pure, i.e. \H(u!q)\ 2 > 0.95||if || 2 for some ujq. 
By modulating, ujq can be moved to 0; therefore, we can, without loss of generality, suppose most 
of the energy concentrates at the frequency 0; then 



1 - 



H(0)\ 2 > 0.95||iJ|| 2 . 



(21) 



So we have 



£#(*) 



t£K L 



> V0MN\\H\\ - \k\-=\\H 



3N 



(22) 



On the other hand, | 52 teK c H(t)\ < \k, c \\\H\\ = (N - so that 

N- \k\ > { V0.95iV - 



(23) 



Let a = ^j-; the above inequality becomes 



a" 



+ (^3 - 2^0.95*3) a - 0.15 < 0. 



(24) 



Thus < a < 0.075. Define now a random variable X K = \^2f =1 ^ ^e useful to 

(25) 



estimate 



E(X K ) = y < 0.075, 



and the expectation of the random variable c zXk , 

E(e x « z ) = e Prob(x K (i) = 0) + e z Prob(x*(i) = 1) = 1 - a + ae z . 



(26) 



Suppose now we sample the signal H r times, and take the 60-th percentile of the numbers 
N\H(ti)\ 2 , . . . , N\H{t r )\ 2 . By Chernoff's standard argument and similar procedure of Lemma 
4.5 in 1201, we have for z > 0, 

Prob (60-th percentile < ^WW^J = [(1 - a)e-° Sz + ae 0Az ] r . 
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Take z = ln(1.25(l - a) /a), then 

(1 - a)e-° Sz + ae 0Az = 1.97a a6 (l - af A . (27) 

The right hand side of (35) is increasing in a on the interval [0, 0.075]; since a < 0.075, we obtain 
an upper bound by substituting 0.075 for a: 

[(1 - a)e-°- 6z + ae 0Az ] r = [l.97a°' 6 (l - a) 0A ] r < e~°- 90r . (28) 

For Prob (60-th percentile < |||//|| 2 ) < 5, we need r > 1.2 ln(l/<5), we have 

Prob(Output > \\H\\ 2 /3) = Prob(60-th percentile of N\H(t)\ 2 > \\H\\ 2 /3) > 1 - S. (29) 

□ 

This norm estimation procedure will be used repeatedly in the group testing step below. 

3.3 Isolation 

For a significant frequency in signal S, isolation aims to construct a series of new signals, such that 
this significant frequency becomes predominant in at least one of the new isolation signals. 

Lemma 3.12. Given signals S, Si, and the parameters as stated in Lemma [PI Suppose Fi(t) = 
Si(t) * X i(t) = (xAt)S(t)) * xM F(t) = S(t) * xM Ifp > 1+(JV ' 1)AT 2 , then for each u with 
\S(u)\ 2 > BX\\S\\ 2 , isolation algorithm can create a signal F*, such that 

\F*(oj)\ 2 > 0.98||F*|| 2 . (30) 

Proof Since |5(o;)| 2 > J? A 1 1 1 1 2 , we have ^(w)] > V BX\\S\\. Then there exists some n > 0, such 
that \S(u)\ > (y/fj + y/BX)\\S\\. LemmalTil states that \£Si(u) - §(u)\ < VBX\\S\\. Therefore 

|5 , iHI>V^||S , ||>V^||5 , i||. (31) 

Isolation algorithm returns F^ , . . . , F^ with k < 0{-), as described in [8|. For any uo with 
l^i^)! 2 > 7 7l|5'i|| 2 ' there exists some j, such that 

\F[ j \u)\ 2 > 0.98\\Fi j) \\ 2 . (32) 

Let F* = F[ j \ then 

\F*(u)\ 2 > 0.98HF!*!! 2 . (33) 

□ 

Theoretically, in order to capture a significant mode, we need 0(1/ n) signals. However, in 
practice, much fewer signals is enough to achieve this goal. 
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3.4 Group Testing 

Isolation has produced several signals, one of which contains the most significant frequency. Group 
testing uses repeated zoom-ins on one of the signals, and norm testing to select where to zoom in, 
in order to determine the frequency. The goal of group testing is thus to find the most significant 
mode of the signal F* from isolation. It uses recursive procedures MSB (Most Significant Bit) to 
approach this mode gradually. 

Definition: Denote a set {u; : (21 - l)JV/32 < uo < (21 + l)iV/32} by interval h 

Group test algorithm is given as follows. 

Algorithm 3.13. Group Testing 

Input isolation signal F* to F±°\ i = 0, q = 1 

While q < N, in the i-th iteration, 

1. Find the most significant bit v and the number of significant intervals c by the procedure 
MSB. 

2. Update i = i + 1, modulate the signal F{" by \_(v + 0.5)iV/16j and dilate it by a factor of 
[16/cJ. Store it in F[ i+1) . 

3. Call Group Test again with the new signal F^\ denote its output by g. 

4. Update the accumulation factor q = q * [16/ c\. 

5. Ifg > N/2, then g = g - N. 

6. return [g/[16/c\ + (v + 1/2)JV/16 + 0.5J (mod N); 
The MSB procedure is as follows. 

Algorithm 3.14. MSB (Most Significant Bit) 

(i) 

Input: signal F^ with length N, a threshold < rj < 1. 

1. Get a series of new signals Hj(t) = F^(t) * (e 2wijt/16 Xi), j = 0, . . . , 15. 

2. Estimate the energies ej ofHj, j = 0, . . . , 15. 

3. for I = 0, . . . , 15, compare the energies e\ with all other energies ej, where j = (I + 
4)mod 16, (I + 5)mod 16, ...,(/ + Y2)mod 16. If e/ > ej for all these j, label it as an 
interval with large energy. 

4. Find the longest consecutive intervals of large energies. Take their center as v, and the 
number of those intervals as c. 

5. Ifc<8, then do the original MSB in [8 1 to get v and set c = 8; 

6. Return the dilation-related factor c and the most significant bit v. 
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(i) 

For convenience, we denote F± by Fi. 

Lemma 3.15. Given a 98% pure signal F 1; suppose Gj(t) = e 2mjt l l6 Xi(f)- Then Alsorithm \3.l3\ 
with Algorithm \3. 14\ as its subroutine, can find the significant frequency uj\ of the signal Fi with 
high probability. 

Proof. The proof is similar to that of Lemma 5 in [8], with some changes: 

Since the signal Fi is 98% pure, there exist a frequency mode uj\ and a signal p, such that Fi = 
atfiuix + P, where \a\ 2 > 0.98||Fi|| 2 and ||p|| 2 < 0.02||Fi|| 2 . Without loss of generality, assume 
uji G [-iV/32, N/32]. The whole region is divided into 16 subintervals [jN/lQ - N/32JN/16 + 
N/32], where j — 0, . . . , 15. To estimate Fi * Go(coi) for |a>i| < N/32, we use that |Go(u;i)| = 
\XiM\ > 0.987 for |wi| < iV/32. It follows that 

|FT^GoK)| 2 = N Fi(wi)Go(a;i) 2 > A^O.987 2 |Fi (c^O | 2 > A^0.987 2 0.98||F! || 2 

> 0.954A^||F 1 || 2 > 0.954AT||F 1 G || 2 = 0.954||Fi * G f. 
Therefore the estimation X of ||Fi * G || satisfies: 

X > ||Fi * Go|| 2 /3 = ||fT*G || 2 /3 = |FT^GoH| 2 /3 > |fT*G (u;i)| 2 /3 

Ul 

> 0.954iV||Fi|| 2 /3 > 0.318iV||Fi|| 2 . 

Next consider the energy of Fi * G 4 . 

||/)G 4 || 2 = ^|pMG 4 M| 2 

Ul 

Since |G 4 (a>i)| < 0.464, we have 

|F 1 (wi)G 4 ( < |F 1 (cji)||G 4 ( wi)| < |Fi(wi)|0.464 < 0.464||Fi|| 
Also ||FiG 4 || 2 - itiMdiiwi)] 2 < 0.02||Fi|| 2 . Thus 

||FiG 4 || 2 < 0.464 2 ||F 1 || 2 + 0.02||Fi|| 2 = 0.24||Fi|| 2 . 

It follows that _____ 

||Fi * G 4 || 2 = ||f7Tg 4 || 2 = ^||FiG 4 || < 0.24iV||F 1 || 2 . 

Then we compare ||Fi * G 4 || 2 with the lower bound of the estimation of ||Fj * G || 2 , which is 

0.24A^||F 1 || 2 < O.SlSiVUFxH 2 , 

which is less than the estimation for ||Fi * G || 2 . In general, cu E intervalj, for j not necessarily 
0. Therefore we compare ||Fi * Gy || 2 with ||Fi * Gj || 2 , where \j — j \ > 4. If there is some j with 
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II Fi * Gj\\ 2 apparently larger than ||Fi * Gy || 2 , then we conclude uj\ interval •/. Otherwise, 
possibly lui G interval^. By the above argument, we can always eliminate 9 consecutive interval 
regions out of 16, leaving a cyclic interval of length at most 7N/16. The remaining proof is exactly 
the same as Lemma 8 in paper JSJl. □ 

Remark: In [20 1, we showed that group testing works for a Box-car filter with width more than 
21, i.e. k > 10. In that case, 2k + 1 intervals are sufficient. A similar conclusion still holds in the 
unevenly spaced data case. However, the lemma above proves the success of group testing under 
different conditions. In our proof, we use a Box-car filter with much shorter width, namely 3 in 
time domain; this works well if 16 intervals are taken. In practice, we use these shorter filters; we 
can usually (if B is small) get away with using much fewer intervals as well (e.g. 3 instead of 16). 

3.5 Adaptive Greedy Pursuit 

In summary, given a signal S, for an accuracy e and for B modes, we can find a very good approx- 
imation of the signal S by using Algorithm l2.2l 

Theorem 3.16. Given a signal S, an accuracy e, success probability 1 — 5, Algorithm \2.2\ can 
output a B-term representation R with sum- square -error \\S — R\\ 2 < (1 + e)\\S — R op t\\ 2 > where 
Ropt is the B-term representation for S with the least sum-square-error, with time and space cost 
poly(B,\og(N), l,log(l/ '6)) for computing and ^^^^ for just visiting 

samples. 

Proof. We omit the proof since it is very similar to Theorem 9 in [ 8 1 . □ 

4 NERA^SFA with Interpolation Technique 

The greedy algorithm described above is fast. When p is sufficiently large (e.g. p > 0.7), the 
approach proposed and discussed in the previous section works well. For smaller p, the amount 
of time wasted to find available sample groups becomes unacceptably long. For example, when 
B = 2, N = 100, p = 0.4, the algorithm couldn't find the signal within 200 greedy pursuit 
iterations. For this reason, we introduced an interpolation technique to get an approximate value 
of the missing point in the norm estimation procedure. This algorithm is efficient even in smaller 
p cases. 

4.1 Lagrange Interpolation Technique 

The task of interpolation is to estimate S(t) for arbitrary t by drawing a smooth curve through 
all the known points ifTTll. It is called interpolation when the desired t is between the largest and 
smallest of these Vs. We use Lagrange Polynomial Interpolation, one of the simplest and most 
popular interpolation techniques. 

Generally, the number of interpolation points determines the degree of a polynomial. A poly- 
nomial of higher degree is smoother with smaller approximation errors at the expense of more 
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computation. Thus we choose a second degree polynomial, as a balance between computational 
complexity and accuracy. It is given explicitly by Lagrange's classical formula. If the three nearest 
neighbors are (ti, S(ti)), (t 2 , S(t 2 )), (t 3 , S(t 3 )), the polynomial is 

w (*i-* 2 )(*i-* 3 ) {h - h)(t 2 - 1 3 ) K2J (t 3 - t 2 )(t 3 - h) ysj 

If S(t) is three times differentiable in an interval [a, b], and the points ti,t 2 ,h £ [a,b] are 
different, then there exists some v £ [a, b], such that the approximation error is S(t) — P(t) = 

^(t-m-t^t-ts). 



4.2 Estimate Norms with Interpolation 

We introduce the interpolation scheme into estimating norms. The idea is to estimate the value of 
a missing point by the Lagrange interpolation. The detailed algorithm for estimating norms is as 
follows. 

Algorithm 4.1. Estimate Norm with interpolation technique 
Input: signal H, k = 0, the maximum number of samples M. 

1. Randomly generate the index tk, where k — 0, . . . , M — 1. 

2. For each k, if H{t k ) is not available, estimate H{t k ) by Lagrange interpolation; else com- 
pute H{t k ) directly. 

3. Estimation = 60-th percentile of the sequence {\H(tk)\ 2 N}, where k = 0, . . . , M — 1. 

Note that we use interpolation only in norm estimation steps, where precision is less critical. 
With less precise norm estimation, the localization of important modes could still work well when 
iterated. For coefficient estimation, which needs to be more precise, we always search for available 
samples. 



5 Numerical Results 

In this section, we present striking numerical results of NERAfSFA, comparing to the Inverse Non- 
equispaced Fast Fourier Transform (INFFT) algorithms. The popular benchmark software NFFT 
version 2.0 is used to give performance of INFFT, with default CGNE_R method and Dirichlet 
kernel. Its time cost excludes the precomputation of samples values, which takes O(L). Numer- 
ical experiments show the advantage of our NERA^SFA algorithm in processing large amount of 
data. We begin in Section 5.1 with comparing NERA£SFA with INFFT for some one and two di- 
mensional examples with different length. In Section 5.2, the performance for different number of 
modes is shown. Finally, we test the capability of NERAfSFA to recover the signal in the situation 
with a large amount of missing data and in presence of large noise. 

All the experiments were run on an AMD Athlon(TM) XP1900+ machine with Cache size 
256KB, total memory 512 MB, Linux kernel version 2.4.20-20.9 and compiler gcc version 3.2.2. 
The numerical data is an average of 10 runs of the code; errors are given in the L? norm. 
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N 


INFFT 


NERAfSFA 


NERAFSFA 






(+sampling) 


(w/o sampling) 


2 9 =512 


0.01 


0.63 


0.31 


2 n =2048 


0.03 


0.77 


0.37 


2 13 =8192 


0.17 


0.90 


0.46 


2 15 =32768 


0.83 


0.93 


0.49 


2 17 =131072 


4.30 


1.03 


0.51 


2 19 =524288 


19.94 


1.20 


0.61 



Table 1: Experiments with fixed B = 8, p = 0.7, d = 1 (one dimension), and varying length N of 
signals; an i.i.d. white noise is added with a = 0.5, or SNR ~ 30dB (see text). For each length 
of the signal, 10 different runs were carried out; the average result is shown. We did all the tests 
for NERAfSFA with Lagrange interpolation, as explained in the text. Two kinds of time costs for 
NERAfSFA are provided. One is the total running time and another is the running time excluding 
the sampling time. The time of INFFT does not include the precomputation time for samples. 

5.1 Experiments with Different Length of Signals 

We ran the comparison for a 8-mode superposition signal S(t) = Ylf=i <?W plus white noise v with 
the standard deviation a = 0.5, damped by a factor of 1/yN, ( so that ||z/|| 2 = a 2 = 0.25; since 
||S|| 2 = 8, this implies SNR = 201og 10 32 « 30.1dB). Other parameters are B = 8, e = 0.02, 
5 = 0.01, andp = 70%. The missing data are randomly and uniformly distributed. NERAfSFA 
outperforms INFFT in speed when N is large; see Table [T] and Figure El The corresponding 
crossover point is N > 2 15 = 32768 . For example, to process 2 19 = 524, 288 data, more 
than nineteen minutes (estimated) are needed for INFFT versus approximately one second for 
NERAfSFA. Experiments support the theoretical conclusion that NERAfSFA would be faster than 
INFFT after some N for a sparse signal; whatever the sparsity, i.e. whatever the value of B, there 
always exists some crossover N. 

In two dimensions, we test a noisy 6-mode superposition signal S(t) = J2f=i + v i 

with B = 6, e = 0.02, 5 = 0.01, p = 80%, and a = 0.1. Missing data are randomly and 
uniformly distributed. As the number of grid points iV in each dimension grows, two dimensional 
NERAfSFA outperforms two dimensional INFFT at iV > 512, as Tableland FigureH|show. The 
crossover point becomes much smaller in high dimensions situation. It would not be surprising 
that for recovering a 6-mode three dimensional signal, NERAfSFA surpasses INFFT at a hundred 
sampling grid points in each dimension. 

5.2 Experiments with Different Number of Modes 

The number of modes has an important influence on the running time since the crossover point 
varies for signals with different B. To investigate this, we did the experiments with fixed iV = 
2 18 = 262144, p = 0.6 and varying B. As before, we take S to be a superposition of exactly B 
modes with white noise, i.e. S(t) = Yn=i c i4>ui + ^ w i tn standard deviation of noise a = 0.05. 
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Figure 2: Time Comparison between INFFT and NERA£SFA for different N with B = 8, p = 0.7, 
d — 1. The result in Table[T]is shown in the form of a graph here. The x coordinate is the log 2 (iV), 
the y coordinate presents the running time for each algorithm. NERA£SFA without sampling 
surpasses INFFT at iV = 2 14 = 16384. 



N 


INFFT 


NERA^SFA 


NERAiSFA 






(+sampling) 


(w/o sampling) 


128 


0.13 


2.86 


1.57 


256 


0.73 


2.60 


1.46 


512 


3.00 


3.70 


2.13 


1024 


11.59 


4.31 


2.94 


2048 


54.94 


6.56 


4.90 



Table 3: Experiments with fixed B = 6, p = 0.8, d = 2 (two dimensions), and varying length iV 
of signals; an i.i.d white noise is added with a = 0.1, or SNR ~ 56dB (see text). For each length 
of the signal, 10 different runs were carried out; the average result is shown. We did all the tests 
for NERA£SFA with two dimensional interpolation techniques as shown in the appendix. Again, 
two kinds of time costs for NERA£SFA, the one with and without sampling time is provided. The 
time of INFFT excludes the sampling time. 



18 



Time Comparison between NERAISFA and INFFT 
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Figure 4: Time comparison between INFFT and NERAISFA for different N with fixed B = 6, 
p = 0.8, d = 2. The x coordinate is the logarithm of length iV of signal in each dimension. 
INFFT is very fast when iV is relatively small and slows down quickly as N increases. On the 
contrary, it takes NERAISFA similar time to process small and large N problem. NERAISFA 
without sampling outperforms INFFT at iV = 2 8 5 =362. 

Available data are uniformly and randomly distributed. Tableland Figure ^compare the running 
time for different B using INFFT and NERAISFA. At first, NERA£SFA takes less time because N 
is so large. However, the execution time of INFFT keeps constant for different number of modes 
B, while that of modified RA^SFA is polynomial of higher order. INFFT is faster than NERA£SFA 
when B > 10. The regression techniques shows empirically that the order of B in NERAfSEA is 
greater than quadratic. This is one of the characteristics of this version of the RA^SFA algorithms 
and irrelevant to the nonequispaceness of the data. (A different version of RA£SFA in [9| is linear 
in B, but maybe less easily used when not all equispaced data are available. ) 

5.3 Experiments for Different Percentage of Missing Data 

The advantage of interpolation techniques is to recover a signal even when a large percentage 
of data is missing. Table |7] shows the recovery effect for a two-mode pure signal Ci^ Wl + C20 W2 , 
N = 10 6 with all the other parameters e and 5 the same as before. When the percentage of available 
data is large, both algorithms recover the signal well with similar running time. 

We tried another example of signal when iV = 100. NERAfSFA without interpolation tech- 
niques fails to recover the signal with high probability if more than 45% data are unavailable. In 
contrast, with the help of interpolation technique, the NERAISFA can always recover the signal 
with only 25% available data. 

Experiments also show that for NERAISFA with interpolation technique, the total number of 
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number of modes 


SNR 


NERA£SFA 


NERAiSFA 


INFFT 


B 


(dB) 


(+sampling) 


(w/o sampling) 




2 


58 


0.06 


0.01 


1.35 


4 


64 


0.24 


0.06 


1.35 


6 


68 


0.61 


0.23 


1.35 


8 


70 


1.44 


0.69 


1.35 


10 


72 


2.45 


1.39 


1.35 


13 


74 


5.78 


3.64 


1.35 


16 


76 


10.03 


7.17 


1.35 



Table 5: Experiments with fixed N = 2 18 , p = 0.6, d — 1 (one dimension), a = 0.05, and varying 
number of modes B of signals. For each length of the signal, 10 different runs were carried out; 
the average result is shown. We did all the tests for NERA£SFA with interpolation techniques. We 
present two different time costs of NERA^SFA, with and without sampling. 



Time of NERAISFA and INFFT for different B 




Number of Modes B 



Figure 6: Time Comparison between INFFT and NERAISFA for different B with with fixed iV = 
2 18 , p = 0.6, d = 1 (one dimension), a = 0.05, a graph of the result in Table[5] The x coordinate is 
the number of modes B, the y coordinate presents running time. The running time of NERAISFA 
is polynomial to B. In contrast, the time of INFFT keeps constant for different B, excluding 
precomputation for the samples. NERAISFA without sampling begins to be slower than INFFT at 
B = 10 for iV = 2 18 . 



20 



p 


Time of NERA^SFA 


success 


Time of NERAf SFA 


success 




(with interpolation) 


probability 


(w/o interpolation) 


probability 


1 


0.03 


100% 


0.03 


100% 


0.8 


0.04 


100% 


0.06 


100% 


0.6 


0.05 


100% 


0.49 


100% 


0.4 


0.05 


100% 


0.45 


100% 


0.3 


0.06 


100% 


- 


0% 


0.2 


0.06 


100% 




0% 


0.1 


0.07 


100% 




0% 


io- 2 


0.11 


100% 




0% 


10~ 3 


0.51 


100% 




0% 


10" 4 


4.58 


100% 




0% 


0.00002 


758.22 


97% 




0% 



Table 7: Experiments with fixed B = 2, N = 10 , no noise, and varying percentage of available 
data. Each entry is based on the average of 10 different runs. In each run, the number of iterations 
is limited to 200; (this also corresponds to a fixed limit to the number of samples taken.) the success 
probability indicates the number of runs in which all 6 modes were found. When only 30% of data 
is available, the NERA£SFA without interpolation cannot find all two significant modes within 200 
iterations. 

available data, instead of the percentage of available data determines the success probability. On 
the contrary, The success of NERA^SFA without interpolation is determined by the percentage. 

5.4 Experiments to Recover Noisy Signals 

To recover a signal from very noisy data is a challenging problem. The following tests are done 
for S(t) = J2f=i Ci&^i + v, B = 6, e = 0.02, N = 2 17 , p = 0.6, and different standard deviation 
a for noise. The amplitude of noise is still multiplied by a factor of 1 / yN. As Table [S] shows, 
NERAf SFA excels at extracting information from noisy data even in the case of small signal to 
noise ratio. 

6 Conclusion 

We provide a sublinear sampling algorithm that recovers, with high probability, a 5-term Fourier 
representation for an unevenly spaced signal. It is faster than any existed methods for processing 
sparse signals of large size. Moreover, it recovers the signal in the situation of large percentage of 
missing data or small signal to noise ratio. 
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a 


SNR 


Time of NERA£SFA 


Time of NERAiSFA 


Relative Error 


Success 




(dB) 


(+sampling) 


( w/o sampling) 


(%) 


probability 







0.48 


0.21 


0.02 


100% 


0.5 


27.60 


0.56 


0.22 


2.00 


100% 


1.0 


15.56 


0.87 


0.32 


4.50 


90% 


1.5 


8.53 


3.94 


1.59 


5.83 


80% 


2.0 


3.52 


4.78 


1.86 


7.67 


50% 


2.5 


-0.35 


7.96 


2.14 


8.50 


30% 



Table 8: Experiments with fixed B = 6, N = 2 , p = 0.6, and varying noise levels. For each 
noise level, 10 different runs were carried out; the average result is shown. In each run, the number 
of iterations is limited to 200; (this also corresponds to a fixed limit to the number of samples 
taken.) the success probability indicates the number of runs in which all 6 modes were found. The 
average relative error is the error of reconstructed signal with respect to the original signal. 
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Appendix 

How to interpolate the two dimensional data to get values for missing points 

In one dimension, values of missing points can be interpolated by its few nearest left and right 
available neighbors. The idea can be extended to higher dimensional cases with more techniques. 

For instance, in two dimensions, we first find four nearest available neighbors of a missing 
point in each quadrant. Suppose a missing point is (x, y), its four neighbors are (x 1 , yi), (x 2 , y 2 ), 
(x 3 , y 3 ), (X4, yi). The weights of neighbors can be derived by solving the following linear system 
of equations. 



(35) 



However, the matrix in d33t could be singular. In this case we choose the three nearest neigh- 
bors in different quadrants and use the following equations: 
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Figure 9: Some geometrical shapes of available neighboring points that occur most often. A 
missing point (denoted by a small cross) is at the center of the cross. Available points are denoted 
by dots. Left: the four available neighbors are located in the shape of cross. The distances of each 
neighbor to the missing point are equal. Right: almost the same as configuration in the left side, 
except one point moved off to the diagonal. 



The time to locate those nearest neighbors and compute corresponding weights is considered a 
part of precomputation and excluded from total running time. 

Note that we can use geometrical arguments to simplify the pre-computation of the weights. 
One easily sees that the system of equations d35t is translation invariant: the two linear system of 
equations 



/ Xi+l X 2 + l Xs + 1 x 4 + I \ I Wi \ / I \ 

Vl+P U2+P V3+P V4+P W 2 _ P 
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have the same solutions for any I and p. That means the location of the missing points does not 
influence the weights. Only the geometrical shape and relative distance of the available neighbors 
of a missing point matters. 

Thus, we compute weights for the geometrical shapes of available neighboring points which 
occur most often. As we go through every missing point, we check if the shape of its neighboring 
available points matches those popular ones; if it does, we can directly get the weights without 
computation. This saves a huge amount of work, especially when p is large. 



23 



V 


4 

P 


4p 4 (l-p)(2-p) 


sum:p 4 + 4p 4 (l — p)(2 — p) 


1 


100% 





100% 


0.9 


65% 


29% 


94% 


0.8 


41% 


39% 


80% 


0.7 


24% 


37% 


61% 


0.6 


13% 


29% 


42% 


0.5 


6% 


19% 


25% 



Table 10: Two possibilities corresponding to the geometrical shapes in Figure 9. The parameter p 
is the percentage of available data. The left side of Figure 9 happens with probability p 4 ; the right 
side appears with probability 4p 4 (l — p)(2 — p). 

For example, if the four neighboring points are located in the shape of a cross with the missing 
point as their center, as the left side of Figure |9] shows, then all of the weights are equal to one 
quarter. This situation happens with probability p 4 , which is almost 2/3 when p = 0.9. Another 
often occurring case typically has one of the four neighbors of the previous configuration moved 
off to the diagonal (see the right side of Figure^), which happens with probability 4p 4 (l —p) (2—p), 
i.e. about 28% when p = 0.9. In this case, the two neighbors on the same line as the mirroring 
points have a weight 0.5 respectively; the other two points have weight zero. Table [TO] shows the 
probabilities of these two situations as p varies. 
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